The Gutenberg English Poetry Corpus: Exemplary Quantitative Narrative Analyses

نویسنده

  • Arthur M. Jacobs
چکیده

This paper describes a corpus of about 3,000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare). Quantitative narrative analysis (QNA) is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC), which comprises over 100 poetic texts with around two million words from about 50 authors (e.g., Keats, Joyce, Wordsworth). Some exemplary QNA studies show author similarities based on latent semantic analysis, significant topics for each author or various text-analytic metrics for George Eliot’s poem “How Lisa Loved the King” and James Joyce’s “Chamber Music,” concerning, e.g., lexical diversity or sentiment analysis. The GEPC is particularly suited for research in Digital Humanities, Computational Stylistics, or Neurocognitive Poetics, e.g., as training and test corpus for stimulus development and control in empirical studies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explorations in an English Poetry Corpus: A Neurocognitive Poetics Perspective

This paper describes a corpus of about 3000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare). Quantitative Narrative Analysis (QNA) is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC) which comp...

متن کامل

Machine Learning for Metrical Analysis of English Poetry

In this work we tackle the challenge of identifying rhythmic patterns in poetry written in English. Although poetry is a literary form that makes use standard meters usually repeated among different authors, we will see in this paper how performing such analyses is a difficult task in machine learning due to the unexpected deviations from such standard patterns. After breaking down some example...

متن کامل

Strategies Available for Translating Persian Epic Poetry: A Case of Shahnameh

This study tried to find the strategies applied in three English translations of the Battle of Rostam and Esfandiyar. To this aim, the source text (ST) was analyzed verse by verse with each verse being compared with its English translations to determine what procedures the translators had used to render the source text. Subsequently, the frequency of usage for each procedure was measured ...

متن کامل

Pinpointing the classifiers of English language writing ability: A discriminant function analysis approach

The  major  aim  of  this  paper  was  to  investigate  the  validity  of  language  and intelligence  factors  for  classifying  Iranian  English  learners`  writing  performance. Iranian  participants  of  the  study  took  three  tests  for  grammar,  breadth,  and  depth  of vocabulary, and two tests for verbal and narrative intelligence. They also produced a corpus  of  argumentative  writ...

متن کامل

Automatic Analysis of Rhythmic Poetry with Applications to Generation and Translation

We employ statistical methods to analyze, generate, and translate rhythmic poetry. We first apply unsupervised learning to reveal word-stress patterns in a corpus of raw poetry. We then use these word-stress patterns, in addition to rhyme and discourse models, to generate English love poetry. Finally, we translate Italian poetry into English, choosing target realizations that conform to desired...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018